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We present a signal-from-background separation study based on neural networks technique applied to a 
W/quartz fiber calorimeter. Performance results in terms of signal efficiency and improvement of the 
signal-to-background ratio are presented. We conclude that by using neural networks we can efficiently 
separate signal from background and achieve a signal enhancement over the background of the order of 
several thousands at high efficiency. 



1 Introduction 

Neural networks are widely used in scientific and commercial applications due to their generally 
better performance compared to traditional statistical approaches and their relatively simple 
operation principle. In high energy physics domain they are commonly used in various pattern 
recognition problems e.g. for quark and gluon jet identification p] El IH| or in top quark £Q and 
Higgs boson searches track finding dUElE], triggering ED]-E3; data mining, and general 

classification tasks. (Introduction to neural networks and review of their applications can be found 
in M-M ) 

In this paper we present a signal-from-background separation study based on neural networks 
technique applied to a W/quartz fiber calorimeter. The performance in terms of signal efficiency 
and improvement of the signal-to-background ratio, and for various calorimeter depths, read-out 
frequencies and total number of channels is presented. 

The paper is organized as follows: in section 2 we give a general introduction on neural networks 
and related topics. The detector and its physics objectives are described in section 3. Section 4 
contains the analysis steps and the results. We summarize and conclude in section 5. 



2 Neural networks 



2.1 General 

A neural network (NN) is a simplified mathematical structure inspired from the real biological 
neural networks and their way of learning from experience, acquiring knowledge and solving 
problems. Their basic units are the neurons, which are interconnected through synapses and 
exchange signals. In general, a neuron produces an output signal which is depending on the 
signals it receives from the other neurons. Of great importance is the fact that the output signal 
is non- linearly dependent to the input, whereas the input signal is approximately the linear sum 
(the "coefficients" are determined by the synaptic strengths) of all the signals that are received by 
the neuron simultaneously. The human brain consists of O(10 12 ) neurons, where each neuron is 
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connected to a number, from 0(1) to O(10 5 ), of other neurons. The whole structure is of immense 
complexity and plasticity and thus ability. 

An NN has the basic concepts of a real biological neural network (neuron, connection strength, 
input linearity, output non-linearity) but in a much more conservative level of complexity. The 
neuron is a mathematical entity which has a real value depending on the connection strengths 
(weights) and the values of the other neurons with which it is connected. The non- linear function 
that relates the output from the neuron with the weights and the inputs to the neuron is usually 
called activation function and has a simple sigmoid shape bound to values in the interval [0,1] or 

[-1,1]- 

A neural network can be structured or self-organized. In a common structured topology the neurons 
are organized in layers. The input layer, from where the NN is fed with the input variables of 
the problem to be solved, followed by a number of layers, the so-called hidden layers, and finally 
the output layer. In the case where the information flow is in one direction only, from the input 
layer towards the output layer, we speak about feed-forward neural networks. For bidirectional 
connections we have feed-back NNs. When the output layer is fed back into the input layer then 
we have recurrent networks. 

An NN is trained by a set of examples which are representative of the problem. During training 
the weights, which are the state variables of the network and determine its behavior, are adapted 
to the presented examples. In such a way the NN acquires knowledge of the rule which produced 
the examples and then it can generalize. Generalization means that the network can be used 
on real events and perform the task that was trained for. Its generalization ability is tested 
usually with a set of test events that we know their properties, with this procedure we validate 
the network's performance and afterwards it is ready to be used on real events. 

The training procedure can be supervised or unsupervised. Supervised training is accomplished by 
presenting input-output pairs to the NN. The NN calculates its output, according to its weights, 
which is compared to the given desirable output. The weights are updated in such a way that 
the NN produces output as close as possible to the desirable one. In unsupervised training the 
network receives only a set of input examples without any output labeling. In this case the task 
is to detect some, yet unknown, structure that the real events may have. 

In the following we concentrate on one of the most commonly used feed-forward neural networks, 
the multilayer perceptron. It is usually being trained with the back- propagation algorithm to 
perform event classification tasks. We describe its architecture and present the mathematical 
background of the training algorithm. First we discuss in brief the problem of pattern recognition. 

2.2 Pattern recognition 

Pattern recognition for event classification is a common problem encountered in high energy 
physics (e.g. particle identification by its shower dimension and shape, electron/hadron discrimi- 
nation, trigger signal generation, quark and gluon jet classification etc.). In all cases, the problem 
consists in defining a procedure that should be followed in online or offline analysis, and will be 
able to recognize events and categorize them based on their features. A feature or pattern, is 
the set of properties that a class of events is characterized of, and with which this class can be 
possibly discriminated by the other ones. The difficulty is to reveal characteristic patterns from 
the measured quantities. Therefore the whole procedure is to find and properly use quantities or 
combination of them, that will allow correct event classification. In an approach without NNs, 
one usually tries to perform classification by imposing a set of one-dimensional cuts on various 
selected measured variables, that may characterize the events of interest. Such cuts are usually 
determined by examining single variable probability distributions for the events of interest and for 
the others (see e.g. (201) ■ When the complexity of the problem is increasing, better results can be 
obtained by employing methods that could simultaneously exploit correlations among variables. 



hidden layer 



input layer 



Figure 1: a multilayer perceptron with one hidden layer. 



In this case a procedure with high degree of parallelism is required. The neural networks are built 
by concept with this kind of parallelism (neurons) , and therefore they are usually used on pattern 
recognition tasks with outstanding performance. 

2.3 Multilayer perceptron 

One of the most widely used NNs for pattern recognition tasks is the multilayer perceptron. It 
is a feed- forward network with its neurons arranged in input, hidden and output layers (fig. QJ. 
Every neuron is connected to all neurons of the preceding layer and of the next one, whereas there 
is no connection between neurons belonging to the same layer. Every neuron receives inputs from 
all neurons of the preceding layer, and sends its output to all ones in the next layer. All neurons 
obey to this connectivity scheme. Every connection is associated to a weight parameter that will 
be optimized during the training phase. The neuron's output is given by the activation function 
f(x), a sigmoid function like eq. 03 that has values bound to the interval [—1,1], or like eq.QJ with 
values bound to the interval [0, 1]. The parameter T, usually called temperature of the network, 
determines how steep the sigmoid function is, the lower T the steeper sigmoid. In general the 
final performance does not depend on T, a value equals to 1 is the usual choice. The input layer 
consists of a number of neurons which is equal to the Ni nput input variables that will be used for 
the event classification. 

Presenting these in mathematical form, we define l\ to be the input to the ith neuron in the fcth 
layer, which consists of N k (*) neurons, Of to be the output of the ith neuron in the kth layer 
and Wj' 1 to be the weight of the connection between the jth neuron in the (k — l)th layer and 
the ith neuron in the kth layer. Then : 



N k-i 




k > 1 



(1) 



j'=i 



0[=/(/f + 4) > k>l (2) 

*in this notation k serves as an index and not as an exponent value 



where f(x) = sigmoid function, e.g. 



f(x) = tanh(x/T) 



(3) 



or 



i 



(4) 



1 + e-*/ T 



and is a so-called bias or threshold of the neuron. During the training phase it is simply 
considered as a weight and it is optimized in a same way. 



For k = 1 (input layer) it is : 




I i = input variables 



i — !>•••> ^ input 



(5) 



The output layer may consist of one neuron and in this case the network is suitable for two 
classes separation problem. In general there may be N outpu t neurons when the events should 
be categorized in 2 Noutp " t classes. In this case the kth bit in a binary representation of the class 
number is determined by the value of the fcth output neuron. Alternatively, the individual N ou t pu t 
neurons could be associated with particular classes among of N outpu t possible choices. 

Concerning the number of hidden layers, a general rule is that no more than two hidden layers 
should be needed [H]. Experience shows that whether a multilayer perceptron with two hidden 
layers performs satisfactorily or not, then by adding to it more hidden layers its performance is 
not improved. In most cases only one hidden layer is sufficient. In fact for problems of function 
approximation, it has been shown [22 E21 EH] that a linear combination of sigmoids can approxi- 
mate any continuous function of one variable or more. Obviously (eq. ^ this can be achieved 
with a network with one hidden layer. 

Concerning the number of hidden neurons that a NN should have, there is no general rule that 
could suggest which number is optimal. In general the number of hidden neurons should be large 
enough to ensure a high degree of classification, and small enough to ensure a high degree of 
generalization. If there are too many hidden neurons, the network tends to learn to recognize 
only the set of examples that were used for its training. Therefore its generalization ability is poor 
and it does not perform well on a test sample or/and on real events. If the number of neurons 
is too small, the network performs bad since during the training phase it is unable to learn to 
classify. 

In the following we describe the back-propagation algorithm which is usually used for training a 
multilayer perceptron. 

2.4 Back-propagation algorithm 

The back-propagation algorithm is a supervised training method often used with various modifi- 
cations for training feed-forward networks. A set of examples is presented to the network which 
determines its output according to its state variables, the weights. The output is compared to the 
desired target output that each example is labeled with. By the difference between the network's 
output and the target one, a so-called error or cost function is determined. It is actually a function 
of the weights, and the algorithm should update and modify these free parameters in such way 
that the error function is minimized, or in other words the network's output is as close as possible 
to the target output. The minimization is done with a gradient descent method with respect to 
the weights. 

We describe the algorithm following the notation given in the previous subsection. We consider 
a multilayer perceptron that is composed of H + 2 layers (an input layer, H hidden ones and 



an output layer). There are N we i g h ts parameters (connection weights and thresholds are simply 
called weights) that should be optimized, where 



H+2 

N weights =Y,( Nk + NkNk ~ 1 ) ( 6 ) 

k=2 



The error function is defined as 

A 

2 



JV exampies 

^(*) = 2 E (0(n;t)-T(n)f (7) 

n=l 

where 0(n; t) is the output of the network corresponding to the nth example after the minimization 
process has been iterated t times, and T(n) is the target output associated with the example. For 
simplicity we consider the case of event classification between two classes (signal associated with 
target output T = 1 and background with T = 0), and so the output layer has only one neuron 
(denoted as O h+2 or simply O). 

The weights are modified after each iteration according to 

w k ji (t + l)=w k ji (t)+Aw k ji (t + l) , k = l,...,H + l (8) 

with 



where the parameter r\ (learning factor) determines the step of change, and thus the training rate, 
and a (momentum coefficient) is a smoothing parameter that helps the method to avoid getting 
stuck around the local minima of the error function. 

The partial derivative of the error function is given by the recurrent relations (event and iteration 
indices are not shown for convenience) 



dE _ D k+i . df(x) 
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■Oj , fc =!,...,# + ! (10) 
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i=i 



w k+1 , k = l,...,H (11) 



and 



D H+2 = qH+2 _ rp (12) 



As can be seen in equations ^] and ^2 the update information flows back from the last layer to 
the previous one (back-propagated). The weights of the neural network can be updated either 
after each example presentation (online training), or after the whole set of examples has been 
processed (batch training). 

In general, we anticipate to reach a state in which the update differences Aro^ are zero or close 
enough to it. After the training phase, we validate the quality of classification task that the 
network achieved with a test sample of events (generalization ability). The neural network is now 
ready to classify real events, that has never seen before, with known classification efficiency. 



The main disadvantage of the algorithm presented above, often called as the standard back- 
propagation method, is that it converges very slowly to the minimum of the error function. Besides 
this convergence time grows very fast with the complexity of the problem and the size of the 
network. There are a couple of decades of different methods, that are based on the core features 
of the standard one, which achieve to speed up the overall performance significantly, at least 
in most cases, by e.g. defining a different error function, changing the learning factor or the 
network's temperature with iteration, using second order derivatives or adding extra terms, or 
even by allowing the network to generate or destroy neurons (self-generation) in order to reach 
better performance. For a rigorous description and references on several methods consult |16j . 

For any training method, concerning the number of example events (N examp i es ) that one should 
use on training, a general rule is that it should be one or two orders of magnitude larger than the 
number of weights (N we i g hts) which are composing the neural network. This is imposed by the 
fact that the generalization ability depends mainly on the ratio Nyjgights/^exampies- Actually, for 
a multilayer perceptron with one hidden layer it has been shown that the generalization error is 

of 0{N weig htsl 'N eX a mp les) plj- 

Of special care is the question on when the training phase should be stopped. The answer to 
this is, whenever there is indication of network over-training. This is the case where the network 
starts learning to recognize only the set of training examples, and as a consequence, losing its 
generalization ability. This can be avoided by performing training and testing in parallel and 
inspect the evolution of the error function for both samples with respect to the iteration of the 
training algorithm. When the value of the error function on the testing examples starts to increase, 
while on the training ones it continues decreasing, there is danger of over-training if we continue 
the training procedure. There is no rule that associates maximum number of iterations with 
over-training or with best training. Generally, this depends on the complexity of the classification 
problem, the complexity of the neural network, the training method and its learning factor. If a 
network does not show satisfactory performance before over-training occurs, then it means that 
the approach to that classification problem with neural network technique unfortunately fails. 

We close this introduction to neural networks by adding another, somehow abstract but at the 
same time hopefully more clear, description of the neural networks approach to pattern recognition 
problems. The whole procedure can be considered as a fitting problem of N examp i es "points" on an 
Ni n p U t x N out put -dimensional "plane" with a "curve" (neural network) of N we i g ht s free parameters. 
The best fitting "curve" (trained neural network), is simply represented by a function parameter- 
ized by the weights (free parameters that were optimized) that receives Ni npu t arguments (the 
network's input variables) and returns a value (results in the output layer) which approximates 
the corresponding "point's position" (classification). 



3 The CASTOR calorimeter 



3.1 Motivation 

Cosmic rays experiments have detected events with unusual properties, the so-called Centauro 
events [25J-[30J, which exhibit small particle multiplicity compared to normal hadronic events, 
complete absence or strong suppression of the electromagnetic component (Nh ac i rcm s / 'Nj 3> 1, 
Ehadrons/ E 7 3> 1) and very high < p t >. Furthermore, a number of hadron-rich events are 
accompanied by a strongly penetrating component observed in the form of halo [26J, strongly 
penetrating clusters [2Z| or long-living cascades, whose transition curves exhibit a characteristic 
form with many maxima and slow attenuation (e.g. fig. OJ) [301 EU These events can not be 
explained in terms of statistical deviation from conventional hadronic physics [321 E31 133] ■ 

According to a phenomenological model [313 EH1 E3, the Centauros are considered to be the 



products of hadronization of a deconfined quark-matter fireball formed in nucleus-nucleus colli- 
sions in the upper atmosphere. The "long penetrating objects" usually accompanying them are 
assumed to be long-lived strangelets, that may have been formed because of a mechanism of 
strangeness separation (HH1 EE] of the fireball's strange quark content and are emitted during 
fireball's hadronization. In a similar way, this kind of events and particles may be produced in 
Pb+Pb collisions at the LHC (yS = 5.5 TeV/nucleon) from the hadronization of a Quark Gluon 
Plasma state formed in the beam fragmentation region. 

3.2 Detector description 

The CASTOR detector ^QJ \^ is a Cherenkov effect based sampling calorimeter with a tungsten 
absorber and quartz fibers as active material. The signal is the Cherenkov light produced by 
the shower charged particles traversing the fibers. The calorimeter is azimuthally divided in 8 
octants and longitudinally segmented in layers. Each absorber layer is followed by a number of 
quartz-fiber planes, altogether consisting a W-fiber layer. The W-fiber layers have 45° inclination 
with respect to the beam axis to achieve maximum light production. The calorimeter consists of 
several channels per octant. One channel consists of a number of consecutive W-fiber layers, the 
signal of which is collected and transmitted to its corresponding photomultiplier through an air- 
core lightguide (fig. The calorimeter is proposed as a very forward detector of the ALICE 02] 
or CMS @B] experiments at the LHC covering the pseudorapidity range 5.46 < rj < 7.14. Its main 
objective is to search in the baryon rich, very forward rapidity region of central Pb+Pb collisions 
for unusual events and "long penetrating objects", assumed to be strangelets, by measuring the 
hadronic and electromagnetic energies and the hadronic shower's longitudinal profile on an event- 
by-event mode. 

Detailed simulation studies of the performance of the CASTOR calorimeter have been done 
14*5] . The calorimeter shows linear response to electrons and hadrons, satisfactory energy resolution 
and very narrow visible transverse size of electromagnetic and hadronic showers, a property that 
derives from the detector's operation principle, based on the Cherenkov effect, which makes such 
calorimeters sensitive essentially only to the shower core EH Concerning the lightguides, 
their shape, dimensions and inner walls have been studied jH] and are optimized for better light 
transmission efficiency. 

In this point we wish to refer the ALICE-PMD (Photon Multiplicity Detector) [49], which covers 
the pseudorapidity region 1.8 < ry < 2.6 and is dedicated to the measurement of photon and 
charged particle multiplicities. It is designed for Centauro events related research but from a 
different viewpoint. Its objective is to detect possible large non-statistical fluctuations on an event- 
by-event basis which is the primary signature of the formation of Disoriented Chiral Condensate 
(e.g. [50]). Also, concerning strangelets research in the central rapidity region covered by the 
ALICE barrel detector, a similar search as has been performed on recent fixed target heavy- 
ion experiments (NA52 (HI E2] at CERN-SPS and E864 [23 EH] at BNL-AGS), aiming on 
detection of particles with low charge-to-mass ratio, is forseen 02] using the central tracking 
system. (For a rigorous review on strange quark matter searches consult |57J). 

3.3 "Long penetrating objects" 

The hypothesis that the long penetrating objects may be strangelets is supported by simula- 
tions [58] which show that the passage of a strangelet through matter produces shower which is 
slowly attenuated, long penetrating and has a longitudinal profile with many maxima structure, 
as observed in cosmic rays experiments [3Qj IS] The passage of strangelets through the CASTOR 
calorimeter has been also simulated (HH] and the analysis of the results has shown that the signal 
can be easily distinguished from the hadronic background for strangelets with energy greater than 
20 TeV. Nevertheless, strangelets with a such high energy are expected to be boosted in high ra- 
pidity (and thus pseudorapidity) and as a consequence it is very likely that they pass outside the 



Figure 2: schematic view of the CASTOR calorimeter, some of its air-core lightguides are shown. 



detector's coverage. In addition, for the identification of lower energy strangelets, a calorimeter 
with very high read-out frequency is required, which is not feasible. In this study we present a 
sophisticated method based on neural networks technique for the separation of the low energy 
strangelets signal from the hadronic background. 



4 Signal-from-background separation analysis 



The forward region (5.46 < rj < 7.14) covered by the CASTOR calorimeter receives 200 ± 11 TeV 
per central Pb+Pb collision at = 5.5 TeV/nucleon, carried by about 2000 particles (event 
generated by HIJING [60J). This amount of energy is associated with the conventional hadronic 
events, treated as background for CASTOR's research interests. We make the basic assumption 
that in case that a strangelet is produced in a Pb+Pb collision, the energy not carried by it is 
going into conventional particle production as described by the event generator. In other words, 
although the detector may receive the same amount of energy, in the first case the event should 
be considered as not interesting (background), whereas in the second one, as signal. The only 
discriminating feature between them is the fact that the strangelet 's passage through the detector 
gives a shower with many maxima and negligible attenuation. 

We study the case of a 5 TeV strangelet, an amount of energy which corresponds to 2.5% of the 
total energy per event that is received by the calorimeter, a fact that makes the separation task 
not trivial because of mainly two reasons. First, the characteristic pattern of the longitudinal 
development of the shower is weak and can be easily masked and suppressed by fluctuations of 
the showers of the other hadrons. Second, due to the fact that the calorimeter is divided in a 
reasonable (low) number of channels, and thus the characteristic many-maxima signal is likely 
to be distributed to consecutive channels and undersampled. The situation would be even worse 
with a detector which was not azimuthally divided. The 8 octants of the CASTOR detector are 
operating as stand-alone calorimeters since the visible shower transverse size is very narrow ^3J 

HE]. 

We should also mention the fact that we must be able to cope with an expected signal-to- 
background ratio (in the raw data recorded) of the order of 1/10000. As it is shown in the 
followings, by using neural networks we can surpass these difficulties and achieve very efficient 
separation of signal from background. 





depth [cmPb] 



Figure 3: longitudinal profile of "long penetrating objects" observed in multilayer lead-emulsion 
chambers (top two plots from jHJ, bottom one from jSOl)- 



4.1 Analysis steps description 

Since the calorimeter is composed of 8 octants which are operating and read-out as individual 
detectors, the task of event classification consists in disentangling an octant which contains the 
signals of a strangelet and the accompanying particles (signal event) from an octant which contains 
only conventional signal (background event). The neural network is fed with input variables 
that are the responses of the channels of the octant and should provide a discriminating value 
(NNoutput) which is close to 1 for signal events and for background ones. A typical distribution 
of signal and background events as a function of NNoutput is shown in fig. QJ By applying a 
suitable cut (NNoutput cu t) we can select a subset which contains sufficiently high number of 
signal events and low number of contaminating background. We define the followings: 

signal efficiency, e s , which represents the probability of classification of a real signal event as 
signal, 

, rr ■ ^siqnalNN /io\ 

signal efficiency: e s = — -r (13) 

^signal 

where N s i gna iN^ is the number of events that have been selected out of N S i gna i signal events due 
to the fact that they produce a NNoutput which satisfies the condition to be greater than the 
imposed cut value NNoutput cu t, 

contamination, e&, which represents the probability of misclassification of a background event as 
signal, 

, . ,. -^siqnallikeNN /-. .\ 

contamination: e& = — — (14) 

^background 

where NsignallikeNN is the number of events that, although they belong to the set of the Nb ac kground 
background events, they produce a NNoutput value which is greater than NNoutput cu t and thus 
they are misclassified as signal events, 



signal enhancement, which represents the factor of improvement of the signal-to-background 




Figure 4: typical signal and background distribution as a function of NNoutput. The hatched 
area contains the N s ; Lgna iNN signal events that are above the selection cut NNoutput CM j. The 
colored area contains the contaminating Nsig na iiikeNN background events. 



ratio, 

signal enhancement = — (15) 

The parameters e s , e&, |* are determined at the training and testing phase of the network and 
quantify its performance. Then by applying the trained network on e.g. a set of raw data that 
was taken during experiment run and is assumed to have a signal-to-background ratio S/B = 
N^" 3 " al ' i we can resu lt m a subset of data to be further analyzed with a signal-to-background 
ratio (S/B)nn enhanced by a factor of 

<n/n\ -^signalNN -^signal ^ s o I t> 

(b/B) NN = — = — • — = — • b/B (lb) 

signallikeN 'N ^background ^b 



For the construction of the neural networks, their training and testing we used the environment 
provided by the MLPfit package [E^i a tool with great functionality in development of multilayer 
perceptrons. Other packages usually used are Jetnet jHE], SNNS |S3|, NNO [Hip- A set of 10000 
signal events and 10000 background ones, which represents the calorimeter's octant simulated 
response to the interesting and non-interesting events as described above, is used for training 
the networks. A second independent set composed by 10000+10000 events is used in the testing 
phase. Several network architectures have been used with one hidden layer since by adding a 
second layer the performance is not improved. Several training algorithms have been initially 
used, and as expected without causing any significant change in the final network performance. 
We chose to work with the BFGS algorithm (Broyden-Fletcher-Godfarb-Shanno) since it is fast, 
very efficient and reliable E2| • 

The general specifications of the calorimeter we used in this study are tabulated in table ^ In 
order to investigate how the total calorimeter depth and the total number of channels influence 
the signal detection efficiency, we studied 9 different configurations tabulated in table El They 
can be categorized in three cases, according to total calorimeter depth or according to depth per 
channel. In the following subsections we use the second categorization scheme. The results are 
presented in terms of signal enhancement (|*-) and signal classification efficiency (e s ). 



4.2 Calorimeter of 1.05 A/ per channel 



We first consider the case where each channel consists of 15 consecutive W-fiber layers, which 



Table 1: calorimeter specifications. 



absorber: 
fiber: 

filling ratio: 
channels: 



W (A/ = 10.0 cm, X =0.365 cm, dcnsity=18.5 gr/cm 3 ) 

maximum 170 layers, 0.5 cm thick each (= 0.071 Aj after 45° inclination) 

quartz core (diameter 600 /im), hard plastic cladding (diameter 630 /im) 
numerical aperture = 0.37 

3 fiber planes per absorber layer (= 1 W-fiber layer) 



fiber volume 



26.5% 



absorber volume 

configurations with 7, 10, 15 consecutive W-fiber layers per channel 



Table 2: calorimeter configurations. 



A/ 's (layers) 
per channel 


# of channels(layers) per octant 
for calorimeter depth 
~9.3A/'s ~ 10.5A/'s ~ 11.7A/'s 


0.49 ( 7) 
0.70 (10) 
1.05 (15) 


19 (133) 21 (147) 24 (168) 
13 (130) 15 (150) 17 (170) 
9 (135) 10 (150) 11 (165) 



correspond to 1.05 A/ per channel (= 28.8 Xo/channel). We studied calorimeters that are com- 
posed of 9, 10 and 11 channels per octant (total depth is 9.45, 10.50, 11.55 A/'s, respectively). 
Neural networks with various numbers of hidden neurons have been used. Their performances in 
terms of signal enhancement (|*-) as a function of signal classification efficiency (e s ) are depicted 
in figures El and Although the performance varies among different configurations, for all 
cases a signal enhancement higher than 1000 can be achieved at satisfactorily high efficiency. The 
results from each neural network configuration are presented analytically in table 01 at page ^] 
and discussed in subsection 4.5. 

4.3 Calorimeter of 0.70 A/ per channel 

In this case each channel consists of 10 consecutive W-fiber layers, which correspond to depth of 
0.70 A/ per channel (= 19.2 JQ) /channel) . The calorimeters that have been studied are composed 
of 13, 15 and 17 channels per octant (total depth is 9.10, 10.50, 11.90 A/'s, respectively). The 
same procedure as in the previous case has been followed, neural networks with various numbers 
of hidden neurons have been studied. Their performance is shown in figures El ED and El We 
can still achieve high signal enhancement and efficiency. Although the channel depth of this 
case is reduced by 28.6% compared to the previous one, we observe that the performance is 
not significantly improved. The results from each neural network configuration are presented 
analytically in table 01 at page fTRl and discussed in subsection 4.5. 

4.4 Calorimeter of 0.49 Xj per channel 

We studied also the case where each channel consists of 7 consecutive W-fiber layers, corresponding 
to depth of 0.49 A/ per channel (= 13.4 Ao/channel). Calorimeters of total depth 9.31, 10.29 and 
11.76 A/'s have been studied (composed of 19, 21 and 24 channels per octant, respectively). A 
significantly improved performance should be reached to justify the high cost, due to large number 
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Figure 5: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
9 channels/octant and total depth of 9.45 A/'s. 
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Figure 6: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
10 channels/octant and total depth of 10.50 A/'s. 




Figure 7: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
11 channels/octant and total depth of 11.55 A/s. 
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Figure 8: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
13 channels/octant and total depth of 9.10 A/'s. 
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Figure 9: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
15 channels/octant and total depth of 10.50 A/s. 
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Figure 10: signal enhancement (— ) as a function of signal efficiency (e s ) for a calorimeter with 
17 channels/octant and total depth of 11.90 A/s. 
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Figure 11: signal enhancement (|^) as a function of signal efficiency (e s ) for a calorimeter with 
19 channels/octant and total depth of 9.31 A/'s. 



of channels. We followed the same procedure as in the previous cases. The performance of the 
neural networks that have been used is depicted in figures ^2 El and The signal enhancement 
at fixed efficiency e s = 0.96 and the efficiency e s at fixed |* = 1000 are presented for each NN 
architecture in table 01 at page ^1 In general, although the performance in terms of achievable 
signal enhancement and efficiency is very satisfactory, it is not considerably better compared to 
that of 1.05 or 0.70 Xi /channel cases. 

In the following we discuss the results of the various calorimeter configurations and neural network 
architectures that have been studied. 

4.5 Results recapitulation 

The signal enhancement |j at e s = 0.96 and the signal classification efficiency e s at |» = 1000 
are presented for each neural network architecture and calorimeter configuration in table 01 As 
proposed in EH we average the performance of the different neural networks which corre- 
spond to the same calorimeter configuration. The results are tabulated in table Q] and plotted 
in figures El and ^1 (points are grouped according to depth per channel and total calorimeter 
depth, respectively). We observe that the signal enhancement is increasing by increasing calorime- 
ter depth and/or decreasing depth per channel. The signal efficiency stays basically unchanged, 
thus the use of a deep calorimeter with frequent read-out results in more efficient background 
discrimination (so lower contamination). 

Generally, in all calorimeter configurations that have been studied, the signal-background classi- 
fication task as performed by neural networks can provide a signal-over-background enhancement 
factor, I s -, larger than 2000(1000) at high signal classification efficiency, e s , of a value of 0.96(0.97). 
This performance is considered very satisfactory and can be achieved even with a moderately 
short calorimeter (9.45 A/'s deep) with a conservative number of channels (9 channels per octant, 
1.05 A//channel). In the case of very frequent read-out (0.49 A//channel), the performance in 
terms of achievable signal enhancement and efficiency is improved. But still is not considerably 
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Figure 12: signal enhancement as a function of signal efficiency (e s ) for a calorimeter with 
21 channels/octant and total depth of 10.29 A/s. 




Figure 13: signal enhancement (— ) as a function of signal efficiency (e s ) for a calorimeter with 
24 channels/octant and total depth of 11.76 A/s. 



better, compared to the cases of 1.05 or 0.70 A//channel read-out, to justify the higher cost of the 
large total number of calorimeter channels. If we had to choose a configuration out of the 9 ones 
that have been studied, we would suggest either the case of a calorimeter of 11.55 Aj's deep with 
11 channels per octant (1.05 A//channel) or one with depth of 10.50 Aj's with 15 channels per 
octant (0.70 Aj/channel), since they both combine adequate number of channels and total depth 
with high performance and cost efficiency. 



5 Summary and conclusions 



We presented a signal-from-background separation study based on neural networks technique. 
We used a multilayer perceptron with one hidden layer that was fed with input variables that 
were the channel responses of each octant of the CASTOR calorimeter. The network was trained 
to distinguish between an octant, which contained the characteristic pattern of the longitudi- 
nal development of the shower of a "long-penetrating object" (strangelet), and an octant, which 
contained only signals from conventional particle showers. 

We concentrated on the case of a strangelet with energy of 5 TeV, an amount which corresponds to 
2.5% of the total energy per event that is expected to be received by the calorimeter. We studied 
calorimeter configurations with various total depths and depths per channel. The results show 
that we can very efficiently separate the signal from the background and compensate the initial 
signal-to-background ratio which is expected to be of the order of 1/10000. The neural networks 
based classification task can provide a signal-over-background enhancement factor larger than 
2000(1000), at signal classification efficiency as high as 0.96(0.97), and thus resulting to a selected 
subset of events to be further analyzed with a significantly improved signal-to-background ratio 
of the order of 0.1 or higher. We stress the fact that this performance is achieved without any 
preprocessing or preselection procedure, thus an even higher signal-to-background ratio might be 
accomplished. 

Concerning the optimum calorimeter configuration in terms of total depth and read-out frequency, 
we conclude that a total depth between 11.55 and 10.50 Aj's and with 1.05-0.70 Aj/channel (10 
to 15 channels per octant) is sufficient to ensure high classification performance. The fact that 
such good performance corresponds to a 5 TeV strangelet leads in concluding that similar or 
even better performance can be achieved for strangelets of higher energy since their signal will be 
stronger and more pronounced. 



Table 3: signal enhancement ^ at e s = 0.96 and signal efficiency e s at ^ = 1000 for the studied 
configurations and neural network architectures. 

calorimeter inputs hidden |£ at e s = 0.96 e s at |£ = 1000 

read-out (= channels) neurons 

1.05 Aj/channel 9 3 1921 0.966 

6 3842 0.971 

9 2800 0.969 

12 3119 0.970 

10 6 1281 0.966 
9 2572 0.971 

12 3197 0.971 

15 2799 0.967 

11 3 6435 0.975 
6 7190 0.973 
9 6720 0.973 

12 6410 0.973 

15 6404 0.973 



0.70 A//channel 13 9 3198 0.971 

12 2404 0.972 

15 3199 0.974 

15 12 6399 0.972 

15 4796 0.970 

18 6401 0.975 

17 12 3996 0.970 

15 3464 0.970 

18 4809 0.972 



0.49 A//channel 19 6 5574 0.974 

9 3832 0.970 

12 4570 0.972 

15 4809 0.976 

21 6 6418 0.978 

9 8019 0.979 

12 8004 0.976 

15 6424 0.979 

24 6 3206 0.973 

9 9615 0.975 

12 5113 0.970 

15 3995 0.972 



Table 4: average signal enhancement at e s = 0.96) and average signal efficiency (e s at ^ 
1000) for the studied calorimeter configurations. 
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Figure 14: average signal enhancement (at e s = 0.96) and efficiency (at ^ = 1000) as a function 
of total calorimeter depth for different channel configurations. A trendline is shown to guide the 
eye. 
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Figure 15: average signal enhancement (at e s = 0.96) and efficiency (at = 1000) as a function 
of depth per channel for different total calorimeter depths. A trendline is shown to guide the eye. 
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